LAPACK Cholesky Routines in Rectangular Full Packed Format
نویسندگان
چکیده
We describe a new data format for storing triangular and symmetric matrices called RFP (Rectangular Full Packed). The standard two dimensional arrays of Fortran and C (also known as full format) that are used to store triangular and symmetric matrices waste half the storage space but provide high performance via the use of level 3 BLAS. Packed format arrays fully utilize storage (array space) but provide low performance as there are no level 3 packed BLAS. We combine the good features of packed and full storage using RFP format to obtain high performance using L3 (Level 3) BLAS as RFP is totally full format. Also, RFP format requires exactly the same minimal storage as packed storage. Each full and/or packed symmetric/triangular routine becomes a single new RFP routine. We present LAPACK routines for Cholesky factorization and inverse computation in RFP format to illustrate this new work and to describe its performance on the Intel, IBM, Itanium, SGI, and Sun platforms. Performance of RPF verses LAPACK full routines is about the same while using half the storage. Performance is roughly one to twenty times faster for LAPACK packed routines while using the same storage. In the performance study only existing LAPACK routines and level 3 BLAS were used. We describe codes to directly input RFP format arrays. These codes are similar to codes for inputting full format arrays. 1 Description of Rectangular Full Packed Format We describe RFP format. It represents a standard packed array as a full 2D array. This means that performance of LAPACK’s [2] packed format routines becomes equal to or better than their full array counterparts. RFP format is a variant of hybrid full packed (HFP) format [1]. RFP format is a rearrangement of a standard full rectangular array SA holding a symmetric / triangular matrix A into a compact full storage rectangular array AR that uses minimal storage NT=N(N+1)/2. Note also that the transpose of the matrix in array AR also represents A. Therefore, Level 3 BLAS can be used on AR or its transpose. In fact, with the equivalent LAPACK algorithm, using array AR on its transpose instead
منابع مشابه
Rectangular Full Packed Format for LAPACK Algorithms Timings on Several Computers
We describe a new data format for storing triangular and symmetric matrices called RFP (Rectangular Full Packed). The standard two dimensional arrays of Fortran and C (also known as full format) that are used to store triangular and symmetric matrices waste nearly half the storage space but provide high performance via the use of level 3 BLAS. Standard packed format arrays fully utilize storage...
متن کاملHigh Performance Cholesky Factorization via Blocking and Recursion That Uses Minimal Storage
We present a high performance Cholesky factorization algorithm , called BPC for Blocked Packed Cholesky, which performs better or equivalent to the LAPACK DPOTRF subroutine, but with about the same memory requirements as the LAPACK DPPTRF subroutine, which runs at level 2 BLAS speed. Algorithm BPC only calls DGEMM and level 3 kernel routines. It combines a recursive algorithm with blocking and ...
متن کاملOptimizing Locality of Reference in Cholesky Algorithms1
This paper presents the principle ideas involved in hierarchical blocking, introduces the block packed storage scheme, and gives the implementation details and the performance rates of the hierarchically blocked Cholesky factorization. In some cases the newly developed routines are faster by an order of magnitude than the corresponding Lapack routines. Introduction Most current computers based ...
متن کاملThree Algorithms for Cholesky Factorization on Distributed Memory Using Packed Storage
We present three algorithms for Cholesky factorization using minimum block storage for a distributed memory (DM) environment. One of the distributed square blocked packed (SBP) format algorithms performs similar to ScaLAPACK PDPOTRF, and with iteration overlapping outperforms it by as much as 67%. By storing the blocks in a standard contiguous way, we get better performing BLAS operations. Our ...
متن کاملA distributed packed storage for large dense parallel in-core calculations
We propose in this paper a distributed packed storage format that exploits the symmetry or the triangular structure of a dense matrix. This format stores only half of the matrix while maintaining most of the efficiency compared to a full storage for a wide range of operations. This work has been motivated by the fact that, contrary to sequential linear algebra libraries (e.g. LAPACK [4]), there...
متن کامل